Toward a Unified Retrieval Outcome Analysis Framework for Cross- Language Information Retrieval
نویسنده
چکیده
This paper proposes a Retrieval Outcome Analysis Framework, or ROA Framework, to systematically evaluate retrieval performance of Cross-Language Information Retrieval systems. The ROA framework goes beyond TREC-type retrieval evaluation methodology by including procedures focusing on individual queries, especially difficult queries. The framework is comprised of four interrelated components: (1) Overall System Performance Evaluation, (2) Query Categorization, (3) Translation Analysis, and (4) Individual Query Analysis. An example of applying the framework is discussed in detail. The author believes the proposed framework would be especially useful for the development of real-world CrossLanguage Information Retrieval systems because the evaluation guided by the framework has the potential to discover causes behind poor retrieval performance. Introduction Cross-Language Information Retrieval (CLIR) is a special case of Information Retrieval (IR). It explores solutions to finding relevant documents in a collection of documents written in a different language or languages from users’ queries. A CLIR system often behaves quite differently in response to different queries: The system retrieves relevant documents or web pages as top-ranked ones for some queries, but it fails to find any relevant documents, or ranks them very low, for some other queries. In the latter case, the users either cannot obtain the needed information, or they have to study the long list of returned documents to locate what they want. CLIR evaluation is an essential part of CLIR system design and development. A well-designed evaluation guided by sound methodology should be able to identify the strengths and the weaknesses of the system, especially the causes of unsatisfactory retrieval performance in response to certain queries, and to provide evidence for system improvement. However, current CLIR evaluation focuses more on the average performance over multiple topics than individual topic, just like monolingual IR system evaluation, as Hu, Bandhakavi, and Zhai have pointed out (2003). Few systems or researchers have performed systematic, in-depth analysis on individual queries or topics. In particular, researchers have paid little attention to those difficult queries or topics for which relevant documents or answers are not found or are ranked very low by IR systems or CLIR systems. Consequently, little is known about why some queries are more difficult then others. Current IR evaluation as conducted by TREC (http://trec.nist.gov/) may help the system to improve overall performance, but produces a limited effect on certain difficult queries because current TREC evaluations lack methods for performing in-depth retrieval analysis. The researcher believes that it is necessary to explore methodological issues of conducting analysis at individual query level in order to understand the causes behind IR system performance. The investigation would benefit IR systems, especially real-world information access and retrieval systems, by allowing system designers to adjust their retrieval and user interaction strategies to provide better service for their users. In this paper, the author introduces a concept called Retrieval Outcome Analysis (ROA). ROA refers to a series of analytical procedures which systematically evaluate information retrieval on individual queries. In contrast to the traditional, TREC-like IR system evaluation paradigm, ROA focuses on exploring the causes behind retrieval performance on individual queries. A well designed ROA should provide more evidence to explain why a system performs well on certain topics and why it does poorly on some others, not just precision and recall scores. In order to demonstrate the usefulness of the ROA and the procedures involved in it, the author proposes an ROA framework as a methodology for CLIR system evaluation. The ROA framework that is built upon the ROA concept will be presented and illustrated in the remaining part of this paper: The next section, “Related Research,” reviews current IR system evaluation strategies and studies that have contributed to IR or CLIR performance analysis methodologies. The following section presents the ROA framework for CLIR. The fourth section provides
منابع مشابه
Semantic annotation for concept-based cross-language medical information retrieval
We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech ...
متن کاملCross-Lingual Medical Information Retrieval through Semantic Annotation
We present a framework for concept-based, cross-lingual information retrieval (CLIR) in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data, whereby documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes ...
متن کاملImage Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix
In this article, a fabulous method for database retrieval is proposed. The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...
متن کاملMatching Meaning for Cross-Language Information Retrieval
This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using ...
متن کاملStructured queries, language modeling, and relevance modeling in cross-language information retrieval
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in a...
متن کامل